Mathematical Programming Models for Balancing Data Quality and Confidentiality in Tabular Data
نویسندگان
چکیده
1. Mathematical Programming Model for Controlled Tabular Adjustment (CTA) Statistical agencies use different methods to protect the confidentiality of tabular data. The most widely used method, complementary cell suppression, suppresses both primary (sensitive) and secondary (non-sensitive cells) to assure confidentiality. Despite its popularity, it suffers from severe limitations. Complementary cell suppression problem is an NP-hard problem, and thereby computationally difficult to solve. It generates tables with missing data and many end users find it difficult to analyse the resulting data from a statistical point of view. For example, users cannot easily estimate means or variance from this data. Finally, published data protected by cell suppression can be susceptible to possible disclosure. Dandekar and Cox (2002) proposed controlled tabular adjustment (CTA) that overcomes many of the problems associated with traditional cell suppression and other perturbation methods. CTA assures confidentiality of the data by setting sensitive cells to either of their predefined protection limits and preserves table additivity by making the best mutual use of sensitive cell and non-sensitive cell adjustments. It can also control adjustment at the individual cell level. CTA has the ability of simultaneously satisfying different types of statistical goals. For example, Cox and Kelly (2003, 2004) demonstrate how CTA can be used to preserve univariate and bivariate properties of tabular data. The end result is that users receive clean, complete, and statistically accurate tabular data. Figure 1 shows a mixed integer-programming model for CTA.
منابع مشابه
[in]appropriate Use of Statistical Measures in [the Name Of] Balancing Data Quality and Confidentiality of Tabular Format Magnitude Data
Statisticians are aware of the fact that measures such as: mean, variance, Pearson correlation coefficient are disproportionately influenced by relatively few extremely large observations and, therefore, are unreliable as statistical measures in comparing overall quality of data with an extremely skewed distribution. Tabular data cells follow an extremely skewed distribution. In this paper we s...
متن کاملUnited Nations Statistical Commission and European Commission Economic Commission for Europe Statistical Office of the Conference of European Statisticians European Communities (eurostat) Joint Ece/eurostat Work Session on Statistical Data Confidentiality Balancing Data Quality and Confidentiality for Tabular Data Invited Paper
1. Tabular data are the earliest form and remain a staple of official statistics data products. Familiar examples of tabular data products in official statistics include count data such as age-race-sex and other demographic data, concentration (or percentage) data such in financial or energy utilization statistics, and magnitude data such as total retail sales or air pollution data. Confidentia...
متن کاملExact, Heuristic and Metaheuristic Methods for Confidentiality Protection by Controlled Tabular Adjustment
Government agencies and commercial organizations that report data face the task of representing the data meaningfully while simultaneously protecting the confidentiality of critical data components. The challenge is to organize and disseminate data in a form that prevents these components from being unmasked by corporate espionage, or falling prey to efforts to penetrate the security of the inf...
متن کاملIntegrated exact, hybrid and metaheuristic learning methods for confidentiality protection
A vital task facing government agencies and commercial organizations that report data is to represent the data in a meaningful way and simultaneously to protect the confidentiality of critical components of this data. The challenge is to organize and disseminate data in a form that prevents such critical components from being inferred by groups bent on corporate espionage, to gain competitive a...
متن کاملA mathematical model for balancing (cost-time-quality and environmental risks) in oil and gas projects and solving it by multi-objective Bee Colony Algorithm
Today, in large projects such as constructing oil, gas and petrochemical refineries, it is inevitable to use modern management methods and project timing. On the other hand, in classic scheduling case, the focus is on balance between time and cost of carrying out projects, which in such a situation, one of possible solutions to shorten time of implementing project is to accelerate activities. T...
متن کامل